Estimating Lexical Priors for Low-Frequency Syncretic Forms
نویسندگان
چکیده
Abstract Given a previously unseen form that is morphologically n-ways ambiguous, what is the best estimator for the lexical prior probabilities for the various functions of the form? We argue that the best estimator is provided by computing the relative frequencies of the various functions among the hapax legomena — the forms that occur exactly once in a corpus. This result has important implications for the development of stochastic morphological taggers, especially when some initial hand-tagging of a corpus is required: For predicting lexical priors for very low-frequency morphologically ambiguous types (most of which would not occur in any given corpus) one should concentrate on tagging a good representative sample of the hapax legomena, rather than extensively tagging words of all frequency ranges.
منابع مشابه
Estimating Lexical Priors for Low-Frequency Morphologically Ambiguous Forms
Given a form that is previously unseen in a sufficiently large training corpus, and that is morphologically n-ways ambiguous (serves n different lexical functions) what is the best estimator for the lexical prior probabilities for the various functions of the form? We argue that the best estimator is provided by computing the relative frequencies of the various functions among the hapax legomen...
متن کاملA Young EFL Learner’s Lexical Development through Different Input and Output Frequency Patterns
The present study was undertaken to investigate the effects of varying frequency patterns (FPs) of words on the productive acquisition of a young EFL learner in a home setting. Target words were presented to the learner using games and role plays. They were subsequently traced for their frequencies in input and output. Eighteen immediate tests and delayed tests were administered to measure the ...
متن کاملLexical Bundles in English Abstracts of Research Articles Written by Iranian Scholars: Examples from Humanities
This paper investigates a special type of recurrent expressions, lexical bundles, defined as a sequence of three or more words that co-occur frequently in a particular register (Biber et al., 1999). Considering the importance of this group of multi-word sequences in academic prose, this study explores the forms and syntactic structures of three- and four-word bundles in English abstracts writte...
متن کاملNative and Non-native Use of Lexical Bundles in Discussion Section of Political Science Articles
The study of lexical bundles, among types of text analysis, is gaining importance over the others in the last century. The present study employed a frequency-based analysis approach to the use of lexical bundles. The discussion section of 60 political science articles, with corpora around 253,063 words were investigated in three aspects of structure, form, and function of lexical bundles. The p...
متن کاملTree Biomass Estimation of Chinese fir (Cunninghamia lanceolata) Based on Bayesian Method
Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.) is the most important conifer species for timber production with huge distribution area in southern China. Accurate estimation of biomass is required for accounting and monitoring Chinese forest carbon stocking. In the study, allometric equation W = a(D2H)b was used to analyze tree biomass of Chinese fir. The common methods for estimating allo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره cmp-lg/9504015 شماره
صفحات -
تاریخ انتشار 1995